ClassMate: A System for Automated Event Extraction from Course Websites

نویسندگان

  • Ashutosh Kulkarni
  • Harry Robertson
چکیده

Websites contain a huge amount of time-critical data in highly unstructured and heterogeneous form. Information Extraction systems can extract relevant entities and relationships from these sites, and identify, classify and categorize them. In this paper, we present ClassMate, a complete system for extracting key course-related events from university course websites. ClassMate pipelines web data through a Named Entity Recognition module, an windowbased event extractor, and a KMeans clusteringbased classifier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A pipeline to extract drug-adverse event pairs from multiple data sources

BACKGROUND Pharmacovigilance aims to uncover and understand harmful side-effects of drugs, termed adverse events (AEs). Although the current process of pharmacovigilance is very systematic, the increasing amount of information available in specialized health-related websites as well as the exponential growth in medical literature presents a unique opportunity to supplement traditional adverse e...

متن کامل

An Image-based Feature Extraction Approach for Phishing Website Detection

Phishing website creators and anti-phishing defenders are in an arms race. Cloning a website is fairly easy and can be automated by any junior programmer. Attempting to recognize numerous phishing links posted in the wild e.g. on social media sites or in email is a constant game of escalation. Automated phishing website detection systems need both speed and accuracy to win. We present a new met...

متن کامل

DIADEM: Thousands of Websites to a Single Database

The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of websites has long proven elusive, despite its central role in the “web of data”. Through an extensive evaluation spanning over 10000 web sites ...

متن کامل

WEAVE: An Automated System for Collating Unstructured Data from WEB and Legacy Sources to Enhance the MRO Supply Chain

Gleaning consistent and complete data from multiple sources of unstructured information is often a difficult and time consuming process. In this paper we outline the WEAVE® system which automates the structuring and collating of unstructured data from multiple on-line Websites. WEAVE® is presented in the context of the maintenance, repair, and operations supply chain. The underlying knowledge r...

متن کامل

Weave: an Automated System for Collating Unstructured Data

Gleaning consistent and complete data from multiple sources of unstructured information is often a difficult and time consuming process. In this paper we outline the WEAVE® system which automates the structuring and collating of unstructured data from multiple on-line Websites. WEAVE® is presented in the context of the maintenance, repair, and operations supply chain. The underlying knowledge r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008